Search Results: "mt"

26 August 2023

Christoph Berg: PostgreSQL Popularity Contest

Back in 2015, when PostgreSQL 9.5 alpha 1 was released, I had posted the PostgreSQL data from Debian's popularity contest. 8 years and 8 PostgreSQL releases later, the graph now looks like this: Currently, the most popular PostgreSQL on Debian systems is still PostgreSQL 13 (shipped in Bullseye), followed by PostgreSQL 11 (Buster). At the time of writing, PostgreSQL 9.6 (Stretch) and PostgreSQL 15 (Bookworm) share the third place, with 15 rising quickly.

21 August 2023

Melissa Wen: AMD Driver-specific Properties for Color Management on Linux (Part 1)

TL;DR: Color is a visual perception. Human eyes can detect a broader range of colors than any devices in the graphics chain. Since each device can generate, capture or reproduce a specific subset of colors and tones, color management controls color conversion and calibration across devices to ensure a more accurate and consistent color representation. We can expose a GPU-accelerated display color management pipeline to support this process and enhance results, and this is what we are doing on Linux to improve color management on Gamescope/SteamDeck. Even with the challenges of being external developers, we have been working on mapping AMD GPU color capabilities to the Linux kernel color management interface, which is a combination of DRM and AMD driver-specific color properties. This more extensive color management pipeline includes pre-defined Transfer Functions, 1-Dimensional LookUp Tables (1D LUTs), and 3D LUTs before and after the plane composition/blending.
The study of color is well-established and has been explored for many years. Color science and research findings have also guided technology innovations. As a result, color in Computer Graphics is a very complex topic that I m putting a lot of effort into becoming familiar with. I always find myself rereading all the materials I have collected about color space and operations since I started this journey (about one year ago). I also understand how hard it is to find consensus on some color subjects, as exemplified by all explanations around the 2015 online viral phenomenon of The Black and Blue Dress. Have you heard about it? What is the color of the dress for you? So, taking into account my skills with colors and building consensus, this blog post only focuses on GPU hardware capabilities to support color management :-D If you want to learn more about color concepts and color on Linux, you can find useful links at the end of this blog post.

Linux Kernel, show me the colors ;D DRM color management interface only exposes a small set of post-blending color properties. Proposals to enhance the DRM color API from different vendors have landed the subsystem mailing list over the last few years. On one hand, we got some suggestions to extend DRM post-blending/CRTC color API: DRM CRTC 3D LUT for R-Car (2020 version); DRM CRTC 3D LUT for Intel (draft - 2020); DRM CRTC 3D LUT for AMD by Igalia (v2 - 2023); DRM CRTC 3D LUT for R-Car (v2 - 2023). On the other hand, some proposals to extend DRM pre-blending/plane API: DRM plane colors for Intel (v2 - 2021); DRM plane API for AMD (v3 - 2021); DRM plane 3D LUT for AMD - 2021. Finally, Simon Ser sent the latest proposal in May 2023: Plane color pipeline KMS uAPI, from discussions in the 2023 Display/HDR Hackfest, and it is still under evaluation by the Linux Graphics community. All previous proposals seek a generic solution for expanding the API, but many seem to have stalled due to the uncertainty of matching well the hardware capabilities of all vendors. Meanwhile, the use of AMD color capabilities on Linux remained limited by the DRM interface, as the DCN 3.0 family color caps and mapping diagram below shows the Linux/DRM color interface without driver-specific color properties [*]: Bearing in mind that we need to know the variety of color pipelines in the subsystem to be clear about a generic solution, we decided to approach the issue from a different perspective and worked on enabling a set of Driver-Specific Color Properties for AMD Display Drivers. As a result, I recently sent another round of the AMD driver-specific color mgmt API. For those who have been following the AMD driver-specific proposal since the beginning (see [RFC][V1]), the main new features of the latest version [v2] are the addition of pre-blending Color Transformation Matrix (plane CTM) and the differentiation of Pre-defined Transfer Functions (TF) supported by color blocks. For those who just got here, I will recap this work in two blog posts. This one describes the current status of the AMD display driver in the Linux kernel/DRM subsystem and what changes with the driver-specific properties. In the next post, we go deeper to describe the features of each color block and provide a better picture of what is available in terms of color management for Linux.

The Linux kernel color management API and AMD hardware color capabilities Before discussing colors in the Linux kernel with AMD hardware, consider accessing the Linux kernel documentation (version 6.5.0-rc5). In the AMD Display documentation, you will find my previous work documenting AMD hardware color capabilities and the Color Management Properties. It describes how AMD Display Manager (DM) intermediates requests between the AMD Display Core component (DC) and the Linux/DRM kernel interface for color management features. It also describes the relevant function to call the AMD color module in building curves for content space transformations. A subsection also describes hardware color capabilities and how they evolve between versions. This subsection, DC Color Capabilities between DCN generations, is a good starting point to understand what we have been doing on the kernel side to provide a broader color management API with AMD driver-specific properties.

Why do we need more kernel color properties on Linux? Blending is the process of combining multiple planes (framebuffers abstraction) according to their mode settings. Before blending, we can manage the colors of various planes separately; after blending, we have combined those planes in only one output per CRTC. Color conversions after blending would be enough in a single-plane scenario or when dealing with planes in the same color space on the kernel side. Still, it cannot help to handle the blending of multiple planes with different color spaces and luminance levels. With plane color management properties, userspace can get a more accurate representation of colors to deal with the diversity of color profiles of devices in the graphics chain, bring a wide color gamut (WCG), convert High-Dynamic-Range (HDR) content to Standard-Dynamic-Range (SDR) content (and vice-versa). With a GPU-accelerated display color management pipeline, we can use hardware blocks for color conversions and color mapping and support advanced color management. The current DRM color management API enables us to perform some color conversions after blending, but there is no interface to calibrate input space by planes. Note that here I m not considering some workarounds in the AMD display manager mapping of DRM CRTC de-gamma and DRM CRTC CTM property to pre-blending DC de-gamma and gamut remap block, respectively. So, in more detail, it only exposes three post-blending features:
  • DRM CRTC de-gamma: used to convert the framebuffer s colors to linear gamma;
  • DRM CRTC CTM: used for color space conversion;
  • DRM CRTC gamma: used to convert colors to the gamma space of the connected screen.

AMD driver-specific color management interface We can compare the Linux color management API with and without the driver-specific color properties. From now, we denote driver-specific properties with the AMD prefix and generic properties with the DRM prefix. For visual comparison, I bring the DCN 3.0 family color caps and mapping diagram closer and present it here again: Mixing AMD driver-specific color properties with DRM generic color properties, we have a broader Linux color management system with the following features exposed by properties in the plane and CRTC interface, as summarized by this updated diagram: The blocks highlighted by red lines are the new properties in the driver-specific interface developed by me (Igalia) and Joshua (Valve). The red dashed lines are new links between API and AMD driver components implemented by us to connect the Linux/DRM interface to AMD hardware blocks, mapping components accordingly. In short, we have the following color management properties exposed by the DRM/AMD display driver:
  • Pre-blending - AMD Display Pipe and Plane (DPP):
    • AMD plane de-gamma: 1D LUT and pre-defined transfer functions; used to linearize the input space of a plane;
    • AMD plane CTM: 3x4 matrix; used to convert plane color space;
    • AMD plane shaper: 1D LUT and pre-defined transfer functions; used to delinearize and/or normalize colors before applying 3D LUT;
    • AMD plane 3D LUT: 17x17x17 size with 12 bit-depth; three dimensional lookup table used for advanced color mapping;
    • AMD plane blend/out gamma: 1D LUT and pre-defined transfer functions; used to linearize back the color space after 3D LUT for blending.
  • Post-blending - AMD Multiple Pipe/Plane Combined (MPC):
    • DRM CRTC de-gamma: 1D LUT (can t be set together with plane de-gamma);
    • DRM CRTC CTM: 3x3 matrix (remapped to post-blending matrix);
    • DRM CRTC gamma: 1D LUT + AMD CRTC gamma TF; added to take advantage of driver pre-defined transfer functions;
Note: You can find more about AMD display blocks in the Display Core Next (DCN) - Linux kernel documentation, provided by Rodrigo Siqueira (Linux/AMD display developer) in a 2021-documentation series. In the next post, I ll revisit this topic, explaining display and color blocks in detail.

How did we get a large set of color features from AMD display hardware? So, looking at AMD hardware color capabilities in the first diagram, we can see no post-blending (MPC) de-gamma block in any hardware families. We can also see that the AMD display driver maps CRTC/post-blending CTM to pre-blending (DPP) gamut_remap, but there is post-blending (MPC) gamut_remap (DRM CTM) from newer hardware versions that include SteamDeck hardware. You can find more details about hardware versions in the Linux kernel documentation/AMDGPU Product Information. I needed to rework these two mappings mentioned above to provide pre-blending/plane de-gamma and CTM for SteamDeck. I changed the DC mapping to detach stream gamut remap matrixes from the DPP gamut remap block. That means mapping AMD plane CTM directly to DPP/pre-blending gamut remap block and DRM CRTC CTM to MPC/post-blending gamut remap block. In this sense, I also limited plane CTM properties to those hardware versions with MPC/post-blending gamut_remap capabilities since older versions cannot support this feature without clashes with DRM CRTC CTM. Unfortunately, I couldn t prevent conflict between AMD plane de-gamma and DRM plane de-gamma since post-blending de-gamma isn t available in any AMD hardware versions until now. The fact is that a post-blending de-gamma makes little sense in the AMD color pipeline, where plane blending works better in a linear space, and there are enough color blocks to linearize content before blending. To deal with this conflict, the driver now rejects atomic commits if users try to set both AMD plane de-gamma and DRM CRTC de-gamma simultaneously. Finally, we had no other clashes when enabling other AMD driver-specific color properties for our use case, Gamescope/SteamDeck. Our main work for the remaining properties was understanding the data flow of each property, the hardware capabilities and limitations, and how to shape the data for programming the registers - AMD color block capabilities (and limitations) are the topics of the next blog post. Besides that, we fixed some driver bugs along the way since it was the first Linux use case for most of the new color properties, and some behaviors are only exposed when exercising the engine. Take a look at the Gamescope/Steam Deck Color Pipeline[**], and see how Gamescope uses the new API to manage color space conversions and calibration (please click on the image for a better view): In the next blog post, I ll describe the implementation and technical details of each pre- and post-blending color block/property on the AMD display driver. * Thank Harry Wentland for helping with diagrams, color concepts and AMD capabilities. ** Thank Joshua Ashton for providing and explaining Gamescope/Steam Deck color pipeline. *** Thanks to the Linux Graphics community - explicitly Harry, Joshua, Pekka, Simon, Sebastian, Siqueira, Alex H. and Ville - to all the learning during this Linux DRM/AMD color journey. Also, Carlos and Tomas for organizing the 2023 Display/HDR Hackfest where we have a great and immersive opportunity to discuss Color & HDR on Linux.

Jonathan Dowland: FreshRSS

Now that it's more convenient for me to run containers at home, I thought I'd write a bit about web apps I am enjoying. First up, FreshRSS, a web feed aggregator. I used to make heavy use of Google Reader until Google killed it, and although a bunch of self-hosted cloned sprung up very quickly afterwards, I didn't transition to any of them. Then followed a number of years within which, in retrospect, I basically didn't do a great job of organising reading the web. This lasted until a couple of years ago when, on a whim, I tried out NetNewsWire for iOS. NetNewsWire is a well-established and much-loved feed reader for Mac which I've never used. I used the iOS version in isolation for a long time: only dipping into web feeds on my phone and never on another device. I'd like to see the old web back, and do my part to make that happen. I've continually published rss and atom feeds for my own blog. I'm also trying to blog more: this might be easier now that Twitter (which, IMHO, took a lot of the energy for quick writing out of blogging) is mortally wounded. So, I'm giving FreshRSS a go. Early signs are good: it's fast, lightweight, easy to use, vaguely resembles how I remember Google Reader, and has a native dark-mode. It's still early days building up a new list of feeds to follow. I'll be sure to share interesting ones as I discover them!

20 August 2023

Russell Coker: GPT Systems and Relationships

Sam Hartman wrote an interesting blog post about his work as a sex and intimacy educator and how GPT systems could impact that [1]. I ve read some positive reviews of Replika a commercial system that is somewhat promoted as a counsellor [2], so I decided to try it out. In my brief trial it seemed to be using all the methods that Android pay to play games are known for. Having multiple types of in-game currency, pay to buy new clothes etc for your friend, etc. Basically it seems pretty horrible. I didn t pay for it and the erotic and romantic features all require payment so I didn t test that. When thinking about this logically, having a system designed to deal with people when they are vulnerable (either being in a romantic relationship or getting counselling) that uses manipulative techniques to get money from them can t have a good result. So a free software system seems the best option. When I first learned of virtual girlfriends I never thought I would feel compelled to advocate for a free software virtual dating program, but that s where the world has got to. Virtual girlfriends have been around for years now. Several years ago I watched a documentary about their use in Japan. It seemed a bit strange when a group of men who had virtual girlfriends had a dinner party with their tablets and phones propped up so their girlfriends could join in as they all appeared to be dating the same girl. The documentary didn t go in to enough detail to cover whether the girlfriend app could learn or be customised enough that they would seem to have different personalities. Virtual boyfriends have also been around for a while apparently without most people noticing. I just Googled it and found a review of a virtual boyfriend app published in 2016! One thing that will probably concern people is the possibility for virtual dating systems to be used for inappropriate things. That is a reasonable thing to be concerned about but I don t think it s possible to prevent technology that has already been released from doing such things. As a general rule technology can always be used for good and bad things so we need to just make it easy to do good things and let the legal system develop ways of dealing with the bad things.

13 August 2023

Jonathan Dowland: Terrain base for 3D castle

terrain base for the castle
I designed and printed a "terrain" base for my 3D castle in OpenSCAD. The castle was the first thing I designed and printed on our (then new) office 3D printer. I use it as a test bed if I want to try something new, and this time I wanted to try procedurally generating a model. I've released the OpenSCAD source for the terrain generator under the name Zarchscape. mid 90s terrain generation
Lots of mid-90s games had very boxy floors Lots of mid-90s games had very boxy floors
Terrain generation, 90s-style. From [this article](https://web.archive.org/web/19990822085321/http://www.gamedesign.net/tutorials/pavlock/cool-ass-terrain/) Terrain generation, 90s-style. From this article
Back in the 90s I spent some time designing maps/levels/arenas for Quake and its sibling games (like Half-Life), mostly in the tool Worldcraft. A lot of beginner maps (including my own), ended up looking pretty boxy. I once stumbled across an article blog post that taught my a useful trick for making more natural-looking terrain. In brief: tessellate the floor region with triangle polygons, then randomly add some jitter to the z-dimension for their vertices. A really simple technique with fairly dramatic results. OpenSCAD Doing the same in OpenSCAD stretched me, and I think stretched OpenSCAD. It left me with some opinions which I'll try to write up in a future blog post. Final results
multicolour
I've generated and printed the result a couple of times, including an attempt a multicolour print. At home, I have a large spool of brown-coloured recycled PLA, and many small lengths of samples in various colours (that I picked up at Maker Faire Czech Republic last year), including some short lengths of green. My home printer is a Prusa Mini, and I cheaped out and didn't buy the filament runout sensor, which would detect when the current filament ran out and let me handle the situation gracefully. Instead, I added several colour change instructions to the g-code at various heights, hoping that whatever plastic I loaded for each layer was enough to get the print to the next colour change instruction. The results are a little mixed I think. I didn't catch the final layer running out in time (forgetting that the Bowden tube also means I need to catch it running out before the loading gear, a few inches earlier than the nozzle), so the final lush green colour ends prematurely. I've also got a fair bit of stringing to clean up. Finally, all these non-flat planes really show up some of the limitations of regular Slicing. It would be interesting to try this with a non-planar Slicer.

9 August 2023

Antoine Beaupr : OpenPGP key transition

This is a short announcement to say that I have changed my main OpenPGP key. A signed statement is available with the cryptographic details but, in short, the reason is that I stopped using my old YubiKey NEO that I have worn on my keyring since 2015. I now have a YubiKey 5 which supports ED25519 which features much shorter keys and faster decryption. It allowed me to move all my secret subkeys on the key (including encryption keys) while retaining reasonable performance. I have written extensive documentation on how to do that OpenPGP key rotation and also YubiKey OpenPGP operations.

Warning on storing encryption keys on a YubiKey People wishing to move their private encryption keys to such a security token should be very careful as there are special precautions to take for disaster recovery. I am toying with the idea of writing an article specifically about disaster recovery for secrets and backups, dealing specifically with cases of death or disabilities.

Autocrypt changes One nice change is the impact on Autocrypt headers, which are considerably shorter. Before, the header didn't even fit on a single line in an email, it overflowed to five lines:
Autocrypt: addr=anarcat@torproject.org; prefer-encrypt=nopreference;
 keydata=xsFNBEogKJ4BEADHRk8dXcT3VmnEZQQdiAaNw8pmnoRG2QkoAvv42q9Ua+DRVe/yAEUd03EOXbMJl++YKWpVuzSFr7IlZ+/lJHOCqDeSsBD6LKBSx/7uH2EOIDizGwfZNF3u7X+gVBMy2V7rTClDJM1eT9QuLMfMakpZkIe2PpGE4g5zbGZixn9er+wEmzk2mt20RImMeLK3jyd6vPb1/Ph9+bTEuEXi6/WDxJ6+b5peWydKOdY1tSbkWZgdi+Bup72DLUGZATE3+Ju5+rFXtb/1/po5dZirhaSRZjZA6sQhyFM/ZhIj92mUM8JJrhkeAC0iJejn4SW8ps2NoPm0kAfVu6apgVACaNmFb4nBAb2k1KWru+UMQnV+VxDVdxhpV628Tn9+8oDg6c+dO3RCCmw+nUUPjeGU0k19S6fNIbNPRlElS31QGL4H0IazZqnE+kw6ojn4Q44h8u7iOfpeanVumtp0lJs6dE2nRw0EdAlt535iQbxHIOy2x5m9IdJ6q1wWFFQDskG+ybN2Qy7SZMQtjjOqM+CmdeAnQGVwxowSDPbHfFpYeCEb+Wzya337Jy9yJwkfa+V7e7Lkv9/OysEsV4hJrOh8YXu9a4qBWZvZHnIO7zRbz7cqVBKmdrL2iGqpEUv/x5onjNQwpjSVX5S+ZRBZTzah0w186IpXVxsU8dSk0yeQskblrwARAQABzSlBbnRvaW5lIEJlYXVwcsOpIDxhbmFyY2F0QHRvcnByb2plY3Qub3JnPsLBlAQTAQgAPgIbAwULCQgHAwUVCgkICwUWAgMBAAIeAQIXgBYhBI3JAc5kFGwEitUPu3khUlJ7dZIeBQJihnFIBQkacFLiAAoJEHkhUlJ7dZIeXNAP/RsX+27l9K5uGspEaMH6jabAFTQVWD8Ch1om9YvrBgfYtq2k/m4WlkMh9IpT89Ahmlf0eq+V1Vph4wwXBS5McK0dzoFuHXJa1WHThNMaexgHhqJOs
 S60bWyLH4QnGxNaOoQvuAXiCYV4amKl7hSuDVZEn/9etDgm/UhGn2KS3yg0XFsqI7V/3RopHiDT+k7+zpAKd3st2V74w6ht+EFp2Gj0sNTBoCdbmIkRhiLyH9S4B+0Z5dUCUEopGIKKOSbQwyD5jILXEi7VTZhN0CrwIcCuqNo7OXI6e8gJd8McymqK4JrVoCipJbLzyOLxZMxGz8Ki0b9O844/DTzwcYcg9I1qogCsGmZfgVze2XtGxY+9zwSpeCLeef6QOPQ0uxsEYSfVgS+onCesSRCgwAPmppPiva+UlGuIMun87gPpQpV2fqFg/V8zBxRvs6YTGcfcQjfMoBHmZTGb+jk1//QAgnXMO7fGG38YH7iQSSzkmodrH2s27ZKgUTHVxpBL85ptftuRqbR7MzIKXZsKdA88kjIKKXwMmez9L1VbJkM4k+1Kzc5KdVydwi+ujpNegF6ZU8KDNFiN9TbDOlRxK5R+AjwdS8ZOIa4nci77KbNF9OZuO3l/FZwiKp8IFJ1nK7uiKUjmCukL0od/6X2rJtAzJmO5Co93ZVrd5r48oqUvjklzzsBNBFmeC3oBCADEV28RKzbv3dEbOocOsJQWr1R0EHUcbS270CrQZfb9VCZWkFlQ/1ypqFFQSjmmUGbNX2CG5mivVsW6Vgm7gg8HEnVCqzL02BPY4OmylskYMFI5Bra2wRNNQBgjg39L9XU4866q3BQzJp3r0fLRVH8gHM54Jf0FVmTyHotR/Xiw5YavNy2qaQXesqqUv8HBIha0rFblbuYI/cFwOtJ47gu0QmgrU0ytDjlnmDNx4rfsNylwTIHS0Oc7Pezp7MzLmZxnTM9b5VMprAXnQr4rewXCOUKBSto+j4rD5/77DzXw96bbueNruaupb2Iy2OHXNGkB0vKFD3xHsXE2x75NBovtABEBAAHCwqwEGAEIACAWIQSNyQHOZBRsBIrVD7t5IVJSe3WSHgUCWZ4LegIbAgFACRB5IV
 JSe3WSHsB0IAQZAQgAHRYhBHsWQgTQlnI7AZY1qz6h3d2yYdl7BQJZngt6AAoJED6h3d2yYdl7CowH/Rp7GHEoPZTSUK8Ss7crwRmuAIDGBbSPkZbGmm4bOTaNs/gealc2tsVYpoMx7aYgqUW+t+84XciKHT+bjRv8uBnHescKZgDaomDuDKc2JVyx6samGFYuYPcGFReRcdmH0FOoPCn7bMW5mTPztV/wIA80LZD9kPKIXanfUyI3HLP0BPwZG4WTpKzJaalR1BNwu2oF6kEK0ymH3LfDiJ5Sr6emI2jrm4gH+/19ux/x+ST4tvm2PmH3BSQOPzgiqDiFd7RZoAIhmwr3FW4epsK9LtSxsi9gZ2vATBKO1oKtb6olW/keQT6uQCjqPSGojwzGRT2thEANH+5t6Vh0oDPZhrKUXRAAxHMBNHEaoo/M0sjZo+5OF3Ig1rMnI6XbKskLv6hu13cCymW0w/5E4XuYnyQ1cNC3pLvqDQbDx5mAPfBVHuqxJdRLQ3yDM/D2QIsxnkzQwi0FsJuni4vuJzWK/NHHDCvxMCh0YmSgbptUtgW8/niatd2Y6MbfRGxUHoctKtzqzivC8hKMTFrj4AbZhg/e9QVCsh5zSXtpWP0qFDJsxRMx0/432n9d4XUiy4U672r9Q09SsynB3QN6nTaCTWCIxGxjIb+8kJrRqTGwy/PElHX6kF0vQUWZNf2ITV1sd6LK/s/7sH+x4rzgUEHrsKr/qPvY3rUY/dQLd+owXesY83ANOu6oMWhSJnPMksbNa4tIKKbjmw3CFIOfoYHOWf3FtnydHNXoXfj4nBX8oSnkfhLILTJgf6JDFXfw6mTsv/jMzIfDs7PO1LK2oMK0+prSvSoM8bP9dmVEGIurzsTGjhTOBcb0zgyCmYVD3S48vZlTgHszAes1zwaCyt3/tOwrzU5JsRJVns+B/TUYaR/u3oIDMDygvE5ObWxXaFVnCC59r+zl0FazZ0ouyk2AYIR
 zHf+n1n98HCngRO4FRel2yzGDYO2rLPkXRm+NHCRvUA/i4zGkJs2AV0hsKK9/x8uMkBjHAdAheXhY+CsizGzsKjjfwvgqf84LwAzSDdZqLVE2yGTOwU0ESiArJwEQAJhtnC6pScWjzvvQ6rCTGAai6hrRiN6VLVVFLIMaMnlUp92EtgVSNpw6kANtRTpKXUB5fIPZVUrVdfEN06t96/6LE42tgifDAFyFTZY5FdHHri1GG/Cr39MpW2VqCDCtTTPVWHTUlU1ZG631BJ+9NB+ce58TmLr6wBTQrT+W367eRFBC54EsLNb7zQAspCn9pw1xf1XNHOGnrAQ4r9BXhOW5B8CzRd4nLRQwVgtw/c5M/bjemAOoq2WkwN+0mfJe4TSfHwFUozXuN274X+0Gr10fhp8xEDYuQM0qu6W3aDXMBBwIu0jTNudEELsTzhKUbqpsBc9WjwNMCZoCuSw/RTpFBV35mXbqQoQgbcU7uWZslLl9Wvv/C6rjXgd+GeX8SGBjTqq1ZkTv5UXLHTNQzPnbkNEExzqToi/QdSjFMIACnakeOSxc0ckfnsd9pfGv1PUyPyiwrHiqWFzBijzGIZEHxhNGFxAkXwTJR7Pd40a7RDxwbO6p/TSIIum41JtteehLHwTRDdQNMoyfLxuNLEtNYS0uR2jYI1EPQfCNWXCdT2ZK/l6GVP6jyB/olHBIOr+oVXqJh+48ki8cATPczhq3fUr7UivmguGwD67/4omZ4PCKtz1hNndnyYFS9QldEGo+AsB3AoUpVIA0XfQVkxD9IZr+Zu6aJ6nWq4M2bsoxABEBAAHCwXYEGAEIACACGwwWIQSNyQHOZBRsBIrVD7t5IVJSe3WSHgUCWPerZAAKCRB5IVJSe3WSHkIgEACTpxdn/FKrwH0/LDpZDTKWEWm4416l13RjhSt9CUhZ/Gm2GNfXcVTfoF/jKXXgjHcV1DHjfLUPmPVwMdqlf5ACOiFqIUM2ag/OEARh356w
 YG7YEobMjX0CThKe6AV2118XNzRBw/S2IO1LWnL5qaGYPZONUa9Pj0OaErdKIk/V1wge8Zoav2fQPautBcRLW5VA33PH1ggoqKQ4ES1hc9HC6SYKzTCGixu97mu/vjOa8DYgM+33TosLyNy+bCzw62zJkMf89X0tTSdaJSj5Op0SrRvfgjbC2YpJOnXxHr9qaXFbBZQhLjemZi6zRzUNeJ6A3Nzs+gIc4H7s/bYBtcd4ugPEhDeCGffdS3TppH9PnvRXfoa5zj5bsKFgjqjWolCyAmEvd15tXz5yNXtvrpgDhjF5ozPiNp/1EeWX4DxbH2i17drVu4fXwauFZ6lcsAcJxnvCA28RlQlmEQu/gFOx1axVXf6GIuXnQSjQN6qJbByUYrdc/cFCxPO2/lGuUxnufN9Tvb51Qh54laPgGLrlD2huQeSD9Sxa0MNUjNY0qLqaReT99Ygb2LPYGSLoFVx9iZz6sZNt07LqCx9qNgsJwsdmwYsNpMuFbc7nkWjtlEqzsXZHTvYN654p43S+hcAhmmOzQZcew6h71fAJLciiqsPBnCEdgCGFAWhZZdPkMA==
After the change, the entire key fits on a single line, neat!
Autocrypt: addr=anarcat@torproject.org; prefer-encrypt=nopreference;
 keydata=xjMEZHZPzhYJKwYBBAHaRw8BAQdAWdVzOFRW6FYVpeVaDo3sC4aJ2kUW4ukdEZ36UJLAHd7NKUFudG9pbmUgQmVhdXByw6kgPGFuYXJjYXRAdG9ycHJvamVjdC5vcmc+wpUEExYIAD4WIQS7ts1MmNdOE1inUqYCKTpvpOU0cwUCZHZgvwIbAwUJAeEzgAULCQgHAwUVCgkICwUWAgMBAAIeAQIXgAAKCRACKTpvpOU0c47SAPdEqfeHtFDx9UPhElZf7nSM69KyvPWXMocu9Kcu/sw1AQD5QkPzK5oxierims6/KUkIKDHdt8UcNp234V+UdD/ZB844BGR2UM4SCisGAQQBl1UBBQEBB0CYZha2IMY54WFXMG4S9/Smef54Pgon99LJ/hJ885p0ZAMBCAfCdwQYFggAIBYhBLu2zUyY104TWKdSpgIpOm+k5TRzBQJkdlDOAhsMAAoJEAIpOm+k5TRzBg0A+IbcsZhLx6FRIqBJCdfYMo7qovEo+vX0HZsUPRlq4HkBAIctCzmH3WyfOD/aUTeOF3tY+tIGUxxjQLGsNQZeGrQI
Note that I have implemented my own kind of ridiculous Autocrypt support for the Notmuch Emacs email client I use, see this elisp code. To import keys, I pipe the message into this script which is basically just:
sq autocrypt decode   gpg --import
... thanks to Sequoia best-of-class Autocrypt support.

Note on OpenPGP usage While some have claimed OpenPGP's death, I believe those are overstated. Maybe it's just me, but I still use OpenPGP for my password management, to authenticate users and messages, and it's the interface to my YubiKey for authenticating with SSH servers. I understand people feel that OpenPGP is possibly insecure, counter-intuitive and full of problems, but I think most of those problems should instead be attributed to its current flagship implementation, GnuPG. I have tried to work with GnuPG for years, and it keeps surprising me with evilness and oddities. I have high hopes that the Sequoia project can bring some sanity into this space, and I also hope that RFC4880bis can eventually get somewhere so we have a more solid specification with more robust crypto. It's kind of a shame that this has dragged on for so long, but Update: there's a separate draft called openpgp-crypto-refresh that might actually be adopted as the "OpenPGP RFC" soon! And it doesn't keep real work from happening in Sequoia and other implementations. Thunderbird rewrote their OpenPGP implementation with RNP (which was, granted, a bumpy road because it lost compatibility with GnuPG) and Sequoia now has a certificate store with trust management (but still no secret storage), preliminary OpenPGP card support and even a basic GnuPG compatibility layer. I'm also curious to try out the OpenPGP CA capabilities. So maybe it's just because I'm becoming an old fart that doesn't want to change tools, but so far I haven't seen a good incentive in switching away from OpenPGP, and haven't found a good set of tools that completely replace it. Maybe OpenSSH's keys and CA can eventually replace it, but I suspect they will end up rebuilding most of OpenPGP anyway, just more slowly. If they do, let's hope they avoid the mistakes our community has done in the past at least...

8 August 2023

Matthew Garrett: Updating Fedora the unsupported way

I dug out a computer running Fedora 28, which was released 2018-04-01 - over 5 years ago. Backing up the data and re-installing seemed tedious, but the current version of Fedora is 38, and while Fedora supports updates from N to N+2 that was still going to be 5 separate upgrades. That seemed tedious, so I figured I'd just try to do an update from 28 directly to 38. This is, obviously, extremely unsupported, but what could possibly go wrong?

Running sudo dnf system-upgrade download --releasever=38 didn't successfully resolve dependencies, but sudo dnf system-upgrade download --releasever=38 --allowerasing passed and dnf started downloading 6GB of packages. And then promptly failed, since I didn't have any of the relevant signing keys. So I downloaded the fedora-gpg-keys package from F38 by hand and tried to install it, and got a signature hdr data: BAD, no. of bytes(88084) out of range error. It turns out that rpm doesn't handle cases where the signature header is larger than a few K, and RPMs from modern versions of Fedora. The obvious fix would be to install a newer version of rpm, but that wouldn't be easy without upgrading the rest of the system as well - or, alternatively, downloading a bunch of build depends and building it. Given that I'm already doing all of this in the worst way possible, let's do something different.

The relevant code in the hdrblobRead function of rpm's lib/header.c is:

int32_t il_max = HEADER_TAGS_MAX;
int32_t dl_max = HEADER_DATA_MAX;

if (regionTag == RPMTAG_HEADERSIGNATURES)
il_max = 32;
dl_max = 8192;


which indicates that if the header in question is RPMTAG_HEADERSIGNATURES, it sets more restrictive limits on the size (no, I don't know why). So I installed rpm-libs-debuginfo, ran gdb against librpm.so.8, loaded the symbol file, and then did disassemble hdrblobRead. The relevant chunk ends up being:

0x000000000001bc81 <+81>: cmp $0x3e,%ebx
0x000000000001bc84 <+84>: mov $0xfffffff,%ecx
0x000000000001bc89 <+89>: mov $0x2000,%eax
0x000000000001bc8e <+94>: mov %r12,%rdi
0x000000000001bc91 <+97>: cmovne %ecx,%eax

which is basically "If ebx is not 0x3e, set eax to 0xffffffff - otherwise, set it to 0x2000". RPMTAG_HEADERSIGNATURES is 62, which is 0x3e, so I just opened librpm.so.8 in hexedit, went to byte 0x1bc81, and replaced 0x3e with 0xfe (an arbitrary invalid value). This has the effect of skipping the if (regionTag == RPMTAG_HEADERSIGNATURES) code and so using the default limits even if the header section in question is the signatures. And with that one byte modification, rpm from F28 would suddenly install the fedora-gpg-keys package from F38. Success!

But short-lived. dnf now believed packages had valid signatures, but sadly there were still issues. A bunch of packages in F38 had files that conflicted with packages in F28. These were largely Python 3 packages that conflicted with Python 2 packages from F28 - jumping this many releases meant that a bunch of explicit replaces and the like no longer existed. The easiest way to solve this was simply to uninstall python 2 before upgrading, and avoiding the entire transition. Another issue was that some data files had moved from libxcrypt-common to libxcrypt, and removing libxcrypt-common would remove libxcrypt and a bunch of important things that depended on it (like, for instance, systemd). So I built a fake empty package that provided libxcrypt-common and removed the actual package. Surely everything would work now?

Ha no. The final obstacle was that several packages depended on rpmlib(CaretInVersions), and building another fake package that provided that didn't work. I shouted into the void and Bill Nottingham answered - rpmlib dependencies are synthesised by rpm itself, indicating that it has the ability to handle extensions that specific packages are making use of. This made things harder, since the list is hard-coded in the binary. But since I'm already committing crimes against humanity with a hex editor, why not go further? Back to editing librpm.so.8 and finding the list of rpmlib() dependencies it provides. There were a bunch, but I couldn't really extend the list. What I could do is overwrite existing entries. I tried this a few times but (unsurprisingly) broke other things since packages depended on the feature I'd overwritten. Finally, I rewrote rpmlib(ExplicitPackageProvide) to rpmlib(CaretInVersions) (adding an extra '\0' at the end of it to deal with it being shorter than the original string) and apparently nothing I wanted to install depended on rpmlib(ExplicitPackageProvide) because dnf finished its transaction checks and prompted me to reboot to perform the update. So, I did.

And about an hour later, it rebooted and gave me a whole bunch of errors due to the fact that dbus never got started. A bit of digging revealed that I had no /etc/systemd/system/dbus.service, a symlink that was presumably introduced at some point between F28 and F38 but which didn't get automatically added in my case because well who knows. That was literally the only thing I needed to fix up after the upgrade, and on the next reboot I was presented with a gdm prompt and had a fully functional F38 machine.

You should not do this. I should not do this. This was a terrible idea. Any situation where you're binary patching your package manager to get it to let you do something is obviously a bad situation. And with hindsight performing 5 independent upgrades might have been faster. But that would have just involved me typing the same thing 5 times, while this way I learned something. And what I learned is "Terrible ideas sometimes work and so you should definitely act upon them rather than doing the sensible thing", so like I said, you should not do this in case you learn the same lesson.

comment count unavailable comments

1 August 2023

Jonathan Dowland: Interzone's new home

IZ #294, the latest issue IZ #294, the latest issue
The long running British1 SF Magazine Interzone has a new home and new editor, Gareth Jelley, starting with issue 294. It's also got a swanky new format ("JB6"): a perfect-bound, paperback novel size, perfect for fitting into an oversize coat or jeans pocket for reading on the train. I started reading Interzone in around 2003, having picked up an issue (#176) from Feb 2002 that was languishing on the shelves in Forbidden Planet. Once I discovered it I wondered why it had taken me so long. That issue introduced me to Greg Egan. I bought a number of back issues on eBay, to grab issues with stories by people including Terry Pratchett, Iain Banks, Alastair Reynolds, and others.
IZ #194: The first by TTA press IZ #194: The first by TTA press
A short while later in early 2004, after 22 years, Interzone's owner and editorship changed from David Pringle to Andy Cox and TTA Press. I can remember the initial transition was very jarring: the cover emphasised expanding into coverage of Manga, Graphic Novels and Video Games (which ultimately didn't happen) but after a short period of experimentation it quickly settled down into a similarly fantastic read. I particularly liked the move to a smaller, perfect-bound form-factor in 2012. I had to double-check this but I'd been reading IZ throughout the TTA era and it lasted 18 years! Throughout that time I have discovered countless fantastic authors that I would otherwise never have experienced. Some (but by no means all) are Dominic Green, Daniel Kaysen, Chris Beckett, C cile Cristofari, Aliya Whiteley, Tim Major, Fran oise Harvey, Will McIntosh. Cox has now retired (after 100 issues and a tenure almost as long as Pringle) and handed the reins to Gareth Jelley/MYY Press, who have published their first issue, #294. Jelley is clearly putting a huge amount of effort into revitalizing the magazine. There's a new homepage at interzone.press but also companion internet presences: a plethora of digital content at interzone.digital, Interzone Socials (a novel idea), a Discord server, a podcast, and no doubt more. Having said that, the economics of small magazines have been perilous for a long time, and that hasn't changed, so I think the future of IZ (in physical format at least) is in peril. If you enjoy short fiction, fresh ideas, SF/F/Fantastika; why not try a subscription to Interzone, whilst you still can!

  1. Interzone has always been "British", in some sense, but never exclusively so. I recall fondly a long-term project under Pringle to publish a lot of Serbian writer Zoran ivkovi , for example, and the very first story I read was by Australian Greg Egan. Under Jelley, the magazine is being printed in Poland and priced in Euros. I expect it to continue to attract and publish writers from all over the place.

22 July 2023

Andrew Cater: 20230722 - Releasing Debian testing for Debian Bookworm point release 12.1

And so we're back at Steve's in Cambridge for release testing. A few testers here: we're now well into testing the various iso images.So we have Sledge, RattusRattus, Isy, smcv and myself. We've also been joined here by Helen who has just done her first install. Online we've got a couple of new folk - luna, The_Blode. Thanks to everyone involved - we always need all the help we can get.
Cold and grey outside: usual warmth in here. Also the usual snake of cables to trip over across the floor.

18 July 2023

Jonathan Dowland: Bea's 3D printer

My daughter Beatrice asked for me to print her a 3D printer.
  Bea's 3D printer   Bea's 3D printer
Most of the model is from https://www.printables.com/model/355917-miniature-3d-printer-model-toy, and the unicorn is from https://www.printables.com/model/385926-unicorn.

12 July 2023

Reproducible Builds: Reproducible Builds in June 2023

Welcome to the June 2023 report from the Reproducible Builds project In our reports, we outline the most important things that we have been up to over the past month. As always, if you are interested in contributing to the project, please visit our Contribute page on our website.


We are very happy to announce the upcoming Reproducible Builds Summit which set to take place from October 31st November 2nd 2023, in the vibrant city of Hamburg, Germany. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. Our aim is to create an inclusive space that fosters collaboration, innovation and problem-solving. We are thrilled to host the seventh edition of this exciting event, following the success of previous summits in various iconic locations around the world, including Venice, Marrakesh, Paris, Berlin and Athens. If you re interesting in joining us this year, please make sure to read the event page] which has more details about the event and location. (You may also be interested in attending PackagingCon 2023 held a few days before in Berlin.)
This month, Vagrant Cascadian will present at FOSSY 2023 on the topic of Breaking the Chains of Trusting Trust:
Corrupted build environments can deliver compromised cryptographically signed binaries. Several exploits in critical supply chains have been demonstrated in recent years, proving that this is not just theoretical. The most well secured build environments are still single points of failure when they fail. [ ] This talk will focus on the state of the art from several angles in related Free and Open Source Software projects, what works, current challenges and future plans for building trustworthy toolchains you do not need to trust.
Hosted by the Software Freedom Conservancy and taking place in Portland, Oregon, FOSSY aims to be a community-focused event: Whether you are a long time contributing member of a free software project, a recent graduate of a coding bootcamp or university, or just have an interest in the possibilities that free and open source software bring, FOSSY will have something for you . More information on the event is available on the FOSSY 2023 website, including the full programme schedule.
Marcel Fourn , Dominik Wermke, William Enck, Sascha Fahl and Yasemin Acar recently published an academic paper in the 44th IEEE Symposium on Security and Privacy titled It s like flossing your teeth: On the Importance and Challenges of Reproducible Builds for Software Supply Chain Security . The abstract reads as follows:
The 2020 Solarwinds attack was a tipping point that caused a heightened awareness about the security of the software supply chain and in particular the large amount of trust placed in build systems. Reproducible Builds (R-Bs) provide a strong foundation to build defenses for arbitrary attacks against build systems by ensuring that given the same source code, build environment, and build instructions, bitwise-identical artifacts are created.
However, in contrast to other papers that touch on some theoretical aspect of reproducible builds, the authors paper takes a different approach. Starting with the observation that much of the software industry believes R-Bs are too far out of reach for most projects and conjoining that with a goal of to help identify a path for R-Bs to become a commonplace property , the paper has a different methodology:
We conducted a series of 24 semi-structured expert interviews with participants from the Reproducible-Builds.org project, and iterated on our questions with the reproducible builds community. We identified a range of motivations that can encourage open source developers to strive for R-Bs, including indicators of quality, security benefits, and more efficient caching of artifacts. We identify experiences that help and hinder adoption, which heavily include communication with upstream projects. We conclude with recommendations on how to better integrate R-Bs with the efforts of the open source and free software community.
A PDF of the paper is now available, as is an entry on the CISPA Helmholtz Center for Information Security website and an entry under the TeamUSEC Human-Centered Security research group.
On our mailing list this month:
The antagonist is David Schwartz, who correctly says There are dozens of complex reasons why what seems to be the same sequence of operations might produce different end results, but goes on to say I totally disagree with your general viewpoint that compilers must provide for reproducability [sic]. Dwight Tovey and I (Larry Doolittle) argue for reproducible builds. I assert Any program especially a mission-critical program like a compiler that cannot reproduce a result at will is broken. Also it s commonplace to take a binary from the net, and check to see if it was trojaned by attempting to recreate it from source.

Lastly, there were a few changes to our website this month too, including Bernhard M. Wiedemann adding a simplified Rust example to our documentation about the SOURCE_DATE_EPOCH environment variable [ ], Chris Lamb made it easier to parse our summit announcement at a glance [ ], Mattia Rizzolo added the summit announcement at a glance [ ] itself [ ][ ][ ] and Rahul Bajaj added a taxonomy of variations in build environments [ ].

Distribution work 27 reviews of Debian packages were added, 40 were updated and 8 were removed this month adding to our knowledge about identified issues. A new randomness_in_documentation_generated_by_mkdocs toolchain issue was added by Chris Lamb [ ], and the deterministic flag on the paths_vary_due_to_usrmerge issue as we are not currently testing usrmerge issues [ ] issues.
Roland Clobus posted his 18th update of the status of reproducible Debian ISO images on our mailing list. Roland reported that all major desktops build reproducibly with bullseye, bookworm, trixie and sid , but he also mentioned amongst many changes that not only are the non-free images being built (and are reproducible) but that the live images are generated officially by Debian itself. [ ]
Jan-Benedict Glaw noticed a problem when building NetBSD for the VAX architecture. Noting that Reproducible builds [are] probably not as reproducible as we thought , Jan-Benedict goes on to describe that when two builds from different source directories won t produce the same result and adds various notes about sub-optimal handling of the CFLAGS environment variable. [ ]
F-Droid added 21 new reproducible apps in June, resulting in a new record of 145 reproducible apps in total. [ ]. (This page now sports missing data for March May 2023.) F-Droid contributors also reported an issue with broken resources in APKs making some builds unreproducible. [ ]
Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE

Upstream patches

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In June, a number of changes were made by Holger Levsen, including:
  • Additions to a (relatively) new Documented Jenkins Maintenance (djm) script to automatically shrink a cache & save a backup of old data [ ], automatically split out previous months data from logfiles into specially-named files [ ], prevent concurrent remote logfile fetches by using a lock file [ ] and to add/remove various debugging statements [ ].
  • Updates to the automated system health checks to, for example, to correctly detect new kernel warnings due to a wording change [ ] and to explicitly observe which old/unused kernels should be removed [ ]. This was related to an improvement so that various kernel issues on Ubuntu-based nodes are automatically fixed. [ ]
Holger and Vagrant Cascadian updated all thirty-five hosts running Debian on the amd64, armhf, and i386 architectures to Debian bookworm, with the exception of the Jenkins host itself which will be upgraded after the release of Debian 12.1. In addition, Mattia Rizzolo updated the email configuration for the @reproducible-builds.org domain to correctly accept incoming mails from jenkins.debian.net [ ] as well as to set up DomainKeys Identified Mail (DKIM) signing [ ]. And working together with Holger, Mattia also updated the Jenkins configuration to start testing Debian trixie which resulted in stopped testing Debian buster. And, finally, Jan-Benedict Glaw contributed patches for improved NetBSD testing.

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

John Goerzen: Backing Up and Archiving to Removable Media: dar vs. git-annex

This is the fourth in a series about archiving to removable media (optical discs such as BD-Rs and DVD+Rs or portable hard drives). Here are the first three parts: I want to state at the outset that this is not a general review of dar or git-annex. This is an analysis of how those tools stack up to a particular use case. Neither tool focuses on this use case, and I note it is particularly far from the more common uses of git-annex. For instance, both tools offer support for cloud storage providers and special support for ssh targets, but neither of those are in-scope for this post. Comparison Matrix As part of this project, I made a comparison matrix which includes not just dar and git-annex, but also backuppc, bacula/bareos, and borg. This may give you some good context, and also some reference for other projects in this general space. Reviewing the Goals I identified some goals in part 1. They are all valid. As I have thought through the project more, I feel like I should condense them into a simpler ordered list, with the first being the most important. I omit some things here that both dar and git-annex can do (updates/incrementals, for instance; see the expanded goals list in part 1). Here they are:
  1. The tool must not modify the source data in any way.
  2. It must be simple to create or update an archive. Processes that require a lot of manual work, are flaky, or are difficult to do correctly, are unlikely to be done correctly and often. If it s easy to do right, I m more likely to do it. Put another way: an archive never created can never be restored.
  3. The chances of a successful restore by someone that is not me, that doesn t know Linux, and is at least 10 years in the future, should be maximized. This implies a simple toolset, solid support for dealing with media errors or missing media, etc.
  4. Both a partial point-in-time restore and a full restore should be possible. The full restore must, at minimum, provide a consistent directory tree; that is, deletions, additions, and moves over time must be accurately reflected. Preserving modification times is a near-requirement, and preserving hard links, symbolic links, and other POSIX metadata is a significant nice-to-have.
  5. There must be a strategy to provide redundancy; for instance, a way for one set of archive discs to be offsite, another onsite, and the two to be periodically swapped.
  6. Use storage space efficiently.
Let s take a look at how the two stack up against these goals. Goal 1: Not modifying source data With dar, this is accomplished. dar --create does not modify source data (and even has a mode to avoid updating atime) so that s done. git-annex normally does modify source data, in that it typically replaces files with symlinks into its hash-indexed storage directory. It can instead use hardlinks. In either case, you will wind up with files that have identical content (but may have originally been separate, non-linked files) linked together with git-annex. This would cause me trouble, as well as run the risk of modifying timestamps. So instead of just storing my data under a git-annex repo as is its most common case, I use the directory special remote with importtree=yes to sort of import the data in. This, plus my desire to have the repos sensible and usable on non-POSIX operating systems, accounts for a chunk of the git-annex complexity you see here. You wouldn t normally see as much complexity with git-annex (though, as you will see, even without the directory special remote, dar still has less complexity). Winner: dar, though I demonstrated a working approach with git-annex as well. Goal 2: Simplicity of creating or updating an archive Let us simply start by recognizing this: Both tools have a lot of power, but I must say, it is easier to wrap my head around what dar is doing than what git-annex is doing. Everything dar does is with files: here are the files to archive, here is an archive file, here is a detached (isolated) catalog. It is very straightforward. It took me far less time to develop my dar page than my git-annex page, despite having existing familiarity with both tools. As I pointed out in part 2, I still don t fully understand how git-annex syncs metadata. Unsolved mysteries from that post include why the two git-annex drives had no idea what was on the other drives, and why the export operation silenty did nothing. Additionally, for the optical disc case, I had to create a restricted-size filesystem/dataset for git-annex to write into in order to get the desired size limit. Looking at the optical disc case, dar has a lot of nice infrastructure built in. With pause and execute, it can very easily be combined with disc burning operations. slice will automatically limit the size of a given slice, regardless of how much disk space is free, meaning that the git-annex tricks of creating smaller filesystems/datasets are unnecessary with dar. To create an initial full backup with dar, you just give it the size of the device, and it will automatically split up the archive, with hooks to integrate for burning or changing drives. About as easy as you could get. With git-annex, you would run the commands to have it fill up the initial filesystem, then burn the disc (or remove the drive), then run the commands to create another repo on the second filesystem, and so forth. With hard drives, with git-annex you would do something similar; let it fill up a repo on a drive, and if it exits with a space error, swap in the next. With dar, you would slice as with an optical disk. Dar s slicing is less convenient in this case, though, as it assumes every drive is the same size and yours may not be. You could work around that by using a slice size no bigger than the smallest drive, and putting multiple slices on larger drives if need be. If a single drive is large enough to hold your entire data set, though, you need not worry about this with either tool. Here s a warning about git-annex: it won t store anything beneath directories named .git. My use case doesn t have many of those. If your use case does, you re going to have to figure out what to do about it. Maybe rename them to something else while the backup runs? In any case, it is simply a fact that git-annex cannot back up git repositories, and this cuts against being able to back up things correctly. Another point is that git-annex has scalability concerns. If your archive set gets into the hundreds of thousands of files, you may need to split it into multiple distinct git-annex repositories. If this occurs and it will in my case it may serve to dull the shine of some of git-annex s features such as location tracking. A detour down the update strategies path Update strategies get a little more complicated with both. First, let s consider: what exactly should our update strategy be? For optical discs, I might consider doing a monthly update. I could burn a disc (or more than one, if needed) regardless of how much data is going to go onto it, because I want no more than a month s data lost in any case. An alternative might be to spool up data until I have a disc s worth, and then write that, but that could possibly mean months between actually burning a disc. Probably not good. For removable drives, we re unlikely to use a new drive each month. So there it makes sense to continue writing to the drive until it s full. Now we have a choice: do we write and preserve each month s updates, or do we eliminate intermediate changes and just keep the most recent data? With both tools, the monthly burn of an optical disc turns out to be very similar to the initial full backup to optical disc. The considerations for spanning multiple discs are the same. With both tools, we would presumably want to keep some metadata on the host so that we don t have to refer to a previous disc to know what was burned. In the dar case, that would be an isolated catalog. For git-annex, it would be a metadata-only repo. I illustrated both of these in parts 2 and 3. Now, for hard drives. Assuming we want to continue preserving each month s updates, with dar, we could just write an incremental to the drive each month. Assuming that the size of the incremental is likely far smaller than the size of the drive, you could easily enough do this. More fancily, you could look at the free space on the drive and tell dar to use that as the size of the first slice. For git-annex, you simply avoid calling drop/dropunused. This will cause the old versions of files to accumulate in .git/annex. You can get at them with git annex commands. This may imply some degree of elevated risk, as you are modifying metadata in the repo each month, which with dar you could chmod a-w or even chattr +i the archive files once written. Hopefully this elevated risk is low. If you don t want to preserve each month s updates, with dar, you could just write an incremental each month that is based on the previous drive s last backup, overwriting the previous. That implies some risk of drive failure during the time the overwrite is happening. Alternatively, you could write an incremental and then use dar to merge it into the previous incremental, creating a new one. This implies some degree of extra space needed (maybe on a different filesystem) while doing this. With git-annex, you would use drop/dropunused as I demonstrated in part 2. The winner for goal 2 is dar. The gap is biggest with optical discs and more narrow with hard drives, thanks to git-annex s different options for updates. Still, I would be more confident I got it right with dar. Goal 3: Greatest chance of successful restore in the distant future If you use git-annex like I suggested in part 2, you will have a set of discs or drives that contain a folder structure with plain files in them. These files can be opened without any additional tools at all. For sheer ability to get at raw data, git-annex has the edge. When you talk about getting a consistent full restore without multiple copies of renamed files or deleted files coming back then you are going to need to use git-annex to do that. Both git-annex and dar provide binaries. Dar provides a win64 version on its Sourceforge page. On the author s releases site, you can find the win64 version in addition to a statically-linked x86_64 version for Linux. The git-annex install page mostly directs you to package managers for your distribution, but the downloads page also lists builds for Linux, Windows, and Mac OS X. The Linux version is dynamic, but ships most of its .so files alongside. The Windows version requires cygwin.dll, and all versions require you to also install git itself. Both tools are in package managers for Mac OS X, Debian, FreeBSD, and so forth. Let s just say that you are likely to be able to run either one on a future Windows or Linux system. There are also GUI frontends for dar, such as DARGUI and gdar. This can increase the chances of a future person being able to use the software easily. git-annex has the assistant, which is based on a different use case and probably not directly helpful here. When it comes to doing the actual restore process using software, dar provides the easier process here. For dealing with media errors and the like, dar can integrate with par2. While technically you could use par2 against the files git-annex writes, that s more cumbersome to manage to the point that it is likely not to be done. Both tools can deal reasonably with missing media entirely. I m going to give the edge on this one to git-annex; while dar does provide the easier restore and superior tools for recovering from media errors, the ability to access raw data as plain files without any tools at all is quite compelling. I believe it is the most critical advantage git-annex has, and it s a big one. Goal 4: Support high-fidelity partial and full restores Both tools make it possible to do a full restore reflecting deletions, additions, and so forth. Dar, as noted, is easier for this, but it is possible with git-annex. So, both can achieve a consistent restore. Part of this goal deals with fidelity of the restore: preserving timestamps, hard and symbolic links, ownership, permissions, etc. Of these, timestamps are the most important for me. git-annex can t do any of that. dar does all of it. Some of this can be worked around using mtree as I documented in part 2. However, that implies a need to also provide mtree on the discs for future users, and I m not sure mtree really exists for Windows. It also cuts against the argument that git-annex discs can be used without any tools. It is true, they can, but all you will get is filename and content; no accurate date. Timestamps are often highly relevant for everything from photos to finding an elusive document or record. Winner: dar. Goal 5: Supporting backup strategies with redundancy My main goal here is to have two separate backup sets: one that is offsite, and one that is onsite. Depending on the strategy and media, they might just always stay that way, or periodically rotate. For instance, with optical discs, you might just burn two copies of every disc and store one at each place. For hard drives, since you will be updating the content of them, you might swap them periodically. This is possible with both tools. With both tools, if using the optical disc scheme I laid out, you can just burn two identical copies of each disc. With the hard drive case, with dar, you can keep two directories of isolated catalogs, one for each drive set. A little identifier file on each drive will let you know which set to use. git-annex can track locations itself. As I demonstrated in part 2, you can make each drive its own repo, add all drives from a given drive set to a git-annex group. When initializing a drive, you tell git-annex what group it s a prt of. From then on, git-annex knows what content is in each group and will add whatever a given drive s group needs to that drive. It s possible to do this with both, but the winner here is git-annex. Goal 6: Efficient use of storage Here are situations in which one or the other will be more efficient: The winner depends on your particular situation. Other notes While not part of the goals above, dar is capable of using tapes directly. While not as common, they are often used in communities of people that archive lots of data. Conclusions Overall, dar is the winner for me. It is simpler in most areas, easier to get correct, and scales very well. git-annex does, however, have some quite compelling points. Being able to access files as plain files is huge, and its location tracking is nicer than dar s, even when using dar_manager. Both tools are excellent and I recommend them both and for more than the particular scenario shown here. Both have fantastic and responsive authors.

11 July 2023

Dirk Eddelbuettel: RcppSpdlog 0.0.14 on CRAN: Upstream Update

Version 0.0.14 of RcppSpdlog is now on CRAN and has just been uploaded to Debian. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich. You can learn more at the nice package documention site. This release simply brings an update to the just release spdlog 1.12.0 from a few days ago. The NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.14 (2023-07-09)
  • Added new badge to README.md
  • Upgraded to upstream releases spdlog 1.12.0

Courtesy of my CRANberries, there is also a diffstat report. More detailed information is on the RcppSpdlog page, or the package documention site. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

29 June 2023

C.J. Collier: Converting a windows install to a libvirt VM

Reduce the size of your c: partition to the smallest it can be and then turn off windows with the understanding that you will never boot this system on the iron ever again.
Boot into a netinst installer image (no GUI). hold alt and press left arrow a few times until you get to a prompt to press enter. Press enter. In this example /dev/sda is your windows disk which contains the c: partition
and /dev/disk/by-id/usb0 is the USB-3 attached SATA controller that you have your SSD attached to (please find an example attached). This SSD should be equal to or larger than the windows disk for best compatability. A photo of a USB-3 attached SATA controller To find the literal path names of your detected drives you can run fdisk -l. Pay attention to the names of the partitions and the sizes of the drives to help determine which is which. Once you have a shell in the netinst installer, you should maybe be able to run a command like the following. This will duplicate the disk located at if (in file) to the disk located at of (out file) while showing progress as the status.
dd if=/dev/sda of=/dev/disk/by-id/usb0 status=progress
If you confirm that dd is available on the netinst image and the previous command runs successfully, test that your windows partition is visible in the new disk s partition table. The start block of the windows partition on each should match, as should the partition size.
fdisk -l /dev/disk/by-id/usb0
fdisk -l /dev/sda
If the output from the first is the same as the output from the second, then you are probably safe to proceed. Once you confirm that you have made and tested a full copy of the blocks from your windows drive saved on your usb disk, nuke your windows partition table from orbit.
dd if=/dev/zero of=/dev/sda bs=1M count=42
You can press alt-f1 to return to the Debian installer now. Follow the instructions to install Debian. Don t forget to remove all attached USB drives. Once you install Debian, press ctrl-alt-f3 to get a root shell. Add your user to the sudoers group:
# adduser cjac sudoers
log out
# exit
log in as your user and confirm that you have sudo
$ sudo ls
Don t forget to read the spider man advice enter your password you ll need to install virt-manager. I think this should help:
$ sudo apt-get install virt-manager libvirt-daemon-driver-qemu qemu-system-x86
insert the USB drive. You can now create a qcow2 file for your virtual machine.
$ sudo qemu-img convert -O qcow2 \
/dev/disk/by-id/usb0 \
/var/lib/libvirt/images/windows.qcow2
I personally create a volume group called /dev/vg00 for the stuff I want to run raw and instead of converting to qcow2 like all of the other users do, I instead write it to a new logical volume.
sudo lvcreate /dev/vg00 -n windows -L 42G # or however large your drive was
sudo dd if=/dev/disk/by-id/usb0 of=/dev/vg00/windows status=progress
Now that you ve got the qcow2 file created, press alt-left until you return to your GDM session. The apt-get install command above installed virt-manager, so log in to your system if you haven t already and open up gnome-terminal by pressing the windows key or moving your mouse/gesture to the top left of your screen. Type in gnome-terminal and either press enter or click/tap on the icon. I like to run this full screen so that I feel like I m in a space ship. If you like to feel like you re in a spaceship, too, press F11. You can start virt-manager from this shell or you can press the windows key and type in virt-manager and press enter. You ll want the shell to run commands such as virsh console windows or virsh list When virt-manager starts, right click on QEMU/KVM and select New.
In the New VM window, select Import existing disk image
When prompted for the path to the image, use the one we created with sudo qemu-img convert above.
Select the version of Windows you want.
Select memory and CPUs to allocate to the VM.
Tick the Customize configuration before install box
If you re prompted to enable the default network, do so now.
The default hardware layout should probably suffice. Get it as close to the underlying hardware as it is convenient to do. But Windows is pretty lenient these days about virtualizing licensed windows instances so long as they re not running in more than one place at a time. Good luck! Leave comments if you have questions.

Antoine Beaupr : Using signal-cli to cancel your Signal account

For obscure reasons, I have found myself with a phone number registered with Signal but without any device associated with it. This is the I lost my phone section in Signal support, which rather unhelpfully tell you that, literally:
Until you have access to your phone number, there is nothing that can be done with Signal.
To be fair, I guess that sort of makes sense: Signal relies heavily on phone numbers for identity. It's how you register to the service and how you recover after losing your phone number. If you have your PIN ready, you don't even change safety numbers! But my case is different: this phone number was a test number, associated with my tablet, because you can't link multiple Android device to the same phone number. And now that I brilliantly bricked that tablet, I just need to tell people to stop trying to contact me over that thing (which wasn't really working in the first place anyway because I wasn't using the tablet that much, but I digress). So. What do you do? You could follow the above "lost my phone" guide and get a new Android or iOS phone to register on Signal again, but that's pretty dumb: I don't want another phone, I already have one. Lo and behold, signal-cli to the rescue!

Disclaimer: no warranty or liability Before following this guide, make sure you remember the license of this website, which specifically has a Section 5 Disclaimer of Warranties and Limitation of Liability. If you follow this guide literally, you might actually get into trouble. You have been warned. All Cats Are Beautiful.

Installing in Docker Because signal-cli is not packaged in Debian (but really should be), I need to bend over backwards to install it. The installation instructions suggest building from source (what is this, GentooBSD?) or installing binary files (what is this, Debiandows?), that's all so last millennium. I want something fresh and fancy, so I went with the extremely legit Docker registry ran by the not-shady-at-all gitlab.com/packaging group which is suspiciously not owned by any GitLab.com person I know of. This is surely perfectly safe.
(Insert long digression on supply chain security here and how Podman is so much superior to Docker. Feel free to dive deep into how RedHat sold out to the nazis or how this is just me ranting about something I don't understand, again. I'm not going to do all the work for you.)
Anyway. The magic command is:
mkdir .config/signal-cli
podman pull registry.gitlab.com/packaging/signal-cli/signal-cli-jre:latest
# lightly hit computer with magic supply chain verification wand
alias signal-cli="podman run --rm --publish 7583:7583 --volume .config/signal-cli:/var/lib/signal-cli --tmpfs /tmp:exec   registry.gitlab.com/packaging/signal-cli/signal-cli-jre:latest --config /var/lib/signal-cli"
At this point, you have a signal-cli alias that should more or less behave as per upstream documentation. Note that it sets up a network service on port 7583 which is unnecessary because you likely won't be using signal-cli's "daemon mode" here, this is a one-shot thing. But I'll probably be reusing those instructions later on, so I figured it might be a safe addition. Besides, it's what the instructions told me to do so I'm blindly slamming my head in the bash pipe, as trained. Also, you're going to have the signal-cli configuration persist in ~/.config/signal-cli there. Again, totally unnecessary.

Re-registering the number Back to our original plan of canceling our Signal account. The next step is, of course, to register with Signal.
Yes, this is a little counter-intuitive and you'd think there would be a "I want off this boat" button on https://signal.org that would do this for you, but hey, I guess that's only reserved for elite hackers who want to screw people over, I mean close their accounts. Mere mortals don't get access to such beauties. Update: a friend reminded me there used to be such a page at https://signal.org/signal/unregister/ but it's mysteriously gone from the web, but still available on the wayback machine although surely that doesn't work anymore. Untested.
To register an account with signal-cli, you first need to pass a CAPTCHA. Those are the funky images generated by deep neural networks that try to fool humans into thinking other neural networks can't break them, and generally annoy the hell out of people. This will generate a URL that looks like:
signalcaptcha://signal-hcaptcha.$UUID.registration.$THIRTYTWOKILOBYTESOFGARBAGE
Yes, it's a very long URL. Yes, you need the entire thing. The URL is hidden behind the Open Signal link, you can right-click on the link to copy it or, if you want to feel like it's 1988 again, use view-source: or butterflies or something. You will also need the phone number you want to unregister here, obviously. We're going to take a not quite random phone number as an example, +18002677468.
Don't do this at home kids! Use the actual number and don't copy-paste examples from random websites!
So the actual command you need to run now is:
signal-cli -a +18002677468 register --captcha signalcaptcha://signal-hcaptcha.$UUID.registration.$THIRTYTWOKILOBYTESOFGARBAGE
To confirm the registration, Signal will send a text message (SMS) to that phone number with a verification code. (Fun fact: it's actually Twilio relaying that message for Signal and that is... not great.) If you don't have access to SMS on that number, you can try again with the --voice option, which will do the same thing with a actual phone call. I wish it would say "Ok boomer" when it calls, but it doesn't. If you don't have access to either, you're screwed. You may be able to port your phone number to another provider to gain control of the phone number again that said, but at that point it's a whole different ball game. With any luck now you've received the verification code. You use it with:
signal-cli -a +18002677468 verify 131213
If you want to make sure this worked, you can try writing to another not random number at all, it should Just Work:
signal-cli -a +18002677468 send -mtest +18005778477
This is almost without any warning on the other end too, which says something amazing about Signal's usability and something horrible about its security.

Unregistering the number Now we get to the final conclusion, the climax. Can you feel it? I'll try to refrain from further rants, I promise. It's pretty simple and fast, just call:
signal-cli -a +18002677468 unregister
That's it! Your peers will now see an "Invite to Signal" button instead of a text field to send a text message.

Cleanup Optionally, cleanup the mess you left on this computer:
rm -r ~/.config/signal-cli
podman image rm registry.gitlab.com/packaging/signal-cli/signal-cli-jre

21 June 2023

Louis-Philippe V ronneau: New Keyboard, Who This?

My old Thinkpad X220 has been slowly dying1 and as much as it makes me sad, it is also showing its age in terms of computational power. As such, I've pre-ordered a Framework 13 (the AMD version) and plan to retire my X220 when I get it. One thing I will miss from that laptop is the keyboard. At work, I dock it (on the amazing Thinkpad dock), which lets me use the keyboard while working on a larger monitor. I could probably replicate this setup with the Framework, but I'm not a fan of trackpads. So I built a keyboard. A nice one. One with a trackpoint. The bare board in its box If you follow Debian Planet, you may have seen the Tex Shinobi when Jonathan Dowland featured it on his blog back in January. It is a Tenkeyless board (saving me precious space at work) and is everything you would want from a old-school Thinkpad keyboard replacement. Since I had no previous experience with "Cherry MX"-style keyboard switches2, I decided to go full-bore and buy the "DIY" model that came unpopulated. A 35 switch tester board To know what model of switches I wanted, I bought a nice switch tester and played with it for a few days. After having thoroughly annoyed my SO (turns out 35 different switches on a little board is an incredible fidget toy), I decided to go with the Gateron Aliaz 70g. They are silent tactile switches, similar to the classic Cherry MX Browns, but with a much nicer sound profile and a much stronger actuation force (55g VS 70g). The end result is somewhat "stiff" keyboard that has a nice "THOCC", while still being relatively silent: perfect for a shared office. The keyboard with the switches, but no keycaps The only thing left to do on this keyboard is to replace the three soldered switches that came pre-installed for the mouse buttons. They are Cherry MX Red low-profile switches and are genuinely terrible3. I will be swapping them for Gateron KS-33 low-profile Blue switches when I get the ones I ordered online. My assembled Tex Shinobi Overall, I am very satisfied with this keyboard and I look forward using it daily when schools starts again in September.

  1. The power button sometimes does not work at all (for minutes?) and the laptop has been shutting down randomly (not a heat issue) more and more often...
  2. I am blessed with an IBM M keyboard at home and am in love with those clicky, very loud buckling springs.
  3. Not only are they linear switches (weird choice for mouse buttons), but they are very inconsistent. All three switches feel different and make different sounds.

18 June 2023

Dirk Eddelbuettel: RcppSpdlog 0.0.13 on CRAN: Small Extensions

Version 0.0.13 of RcppSpdlog is now on CRAN and will be soon be uploaded to Debian too. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich. You can learn more at the package documention site. This release adds a small (but handy) accessor generalisation: Instead of calling setup() with two arguments for a label and the logging level we now only require the desired level. We also cleaned up one implementation detail for the stopwatch feature added in January, and simplified the default C++ compilation standard setting. The NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.13 (2023-06-17)
  • Minor tweak to stopwatch setup avoids pulling in fmt
  • No longer set a C++ compilation standard as the default choices by R are sufficient for the package
  • Add convenience wrapper log_init omitting first argument to log_setup while preserving the interface from the latter
  • Add convenience setup wrappers init and log to API header file spdl.h

Courtesy of my CRANberries, there is also a diffstat report. More detailed information is on the RcppSpdlog page, or the package documention site. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

16 June 2023

John Goerzen: Using git-annex for Data Archiving

In my recent post about data archiving to removable media, I laid out the difference between backing up and archiving, and also said I d evaluate git-annex and dar. This post evaluates git-annex. The next will look at dar, and then I ll make a comparison post. What is git-annex? git-annex is a fantastic and versatile program that does well, it s one of those things that can do so much that it s a bit hard to describe. Its homepage says:
git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.
I think the particularly interesting features of git-annex aren t actually included in that list. Among the features of git-annex that make it shine for this purpose, its location tracking is key. git-annex can know exactly which device has which file at which version at all times. Combined with its preferred content settings, this lets you very easily say things like: git-annex can be set to allow a configurable amount of free space to remain on a device, and it will fill it up with whatever copies are necessary up until it hits that limit. Very convenient! git-annex will store files in a folder structure that mirrors the origin folder structure, in plain files just as they were. This maximizes the ability for a future person to access the content, since it is all viewable without any special tool at all. Of course, for things like optical media, git-annex will essentially be creating what amounts to incrementals. To obtain a consistent copy of the original tree, you would still need to use git-annex to process (export) the archives. git-annex challenges In my prior post, I related some challenges with git-annex. The biggest of them quite poor performance of the directory special remote when dealing with many files has been resolved by Joey, git-annex s author! That dramatically improves the git-annex use scenario here! The fixing commit is in the source tree but not yet in a release. git-annex no doubt may still have performance challenges with repositories in the 100,000+-range, but in that order of magnitude it now looks usable. I m not sure about 1,000,000-file repositories (I haven t tested); there is a page about scalability. A few other more minor challenges remain: I worked around the timestamp issue by using the mtree-netbsd package in Debian. mtree writes out a summary of files and metadata in a tree, and can restore them. To save: mtree -c -R nlink,uid,gid,mode -p /PATH/TO/REPO -X <(echo './.git') > /tmp/spec And, after restoration, the timestamps can be applied with: mtree -t -U -e < /tmp/spec Walkthrough: initial setup To use git-annex in this way, we have to do some setup. My general approach is this: Let's get started! I've set all these shell variables appropriately for this example, and REPONAME to "testdata". We'll begin by setting up the metadata-only tracking repo.
$ REPONAME=testdata
$ mkdir "$METAREPO"
$ cd "$METAREPO"
$ git init
$ git config annex.thin true
There is a sort of complicated topic of how git-annex stores files in a repo, which varies depending on whether the data for the file is present in a given repo, and whether the file is locked or unlocked. Basically, the options I use here cause git-annex to mostly use hard links instead of symlinks or pointer files, for maximum compatibility with non-POSIX filesystems such as NTFS and UDF, which might be used on these devices. thin is part of that. Let's continue:
$ git annex init 'local hub'
init local hub ok
(recording state in git...)
$ git annex wanted . "include=* and exclude=$REPONAME/*"
wanted . ok
(recording state in git...)
In a bit, we are going to import the source data under the directory named $REPONAME (here, testdata). The wanted command says: in this repository (represented by the bare dot), the files we want are matched by the rule that says eveyrthing except what's under $REPONAME. In other words, we don't want to make an unnecessary copy here. Because I expect to use an mtree file as documented above, and it is not under $REPONAME/, it will be included. Let's just add it and tweak some things.
$ touch mtree
$ git annex add mtree
add mtree
ok
(recording state in git...)
$ git annex sync
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
[main (root-commit) 6044742] git-annex in local hub
1 file changed, 1 insertion(+)
create mode 120000 mtree
ok
$ ls -l
total 9
lrwxrwxrwx 1 jgoerzen jgoerzen 178 Jun 15 22:31 mtree -> .git/annex/objects/pX/ZJ/...
OK! We've added a file, and it got transformed into a symlink. That's the thing I said we were going to avoid, so:
git annex adjust --unlock-present
adjust
Switched to branch 'adjusted/main(unlockpresent)'
ok
$ ls -l
total 1
-rw-r--r-- 2 jgoerzen jgoerzen 0 Jun 15 22:31 mtree
You'll notice it transformed into a hard link (nlinks=2) file. Great! Now let's import the source data. For that, we'll use the directory special remote.
$ git annex initremote source type=directory directory=$SOURCEDIR importtree=yes \
encryption=none
initremote source ok
(recording state in git...)
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
$ git config remote.source.annex-tracking-branch main:$REPONAME
OK, so here we created a new remote named "source". We enabled it, and set some configuration. Most notably, that last line causes files from "source" to be imported under $REPONAME/ as we wanted earlier. Now we're ready to scan the source.
$ git annex sync
At this point, you'll see git-annex computing a hash for every file in the source directory. I can verify with du that my metadata-only repo only uses 14MB of disk space, while my source is around 4GB. Now we can see what git-annex thinks about file locations:
$ git-annex whereis less
whereis mtree (1 copy)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
ok
whereis testdata/[redacted] (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
... many more lines ...
So remember we said we wanted mtree, but nothing under testdata, under this repo? That's exactly what we got. git-annex knows that the files under testdata can be found under the "source" special remote, but aren't in any git-annex repo -- yet. Now we'll start adding them. Walkthrough: removable drives I've set up two 500MB filesystems to represent removable drives. We'll see how git-annex works with them.
$ cd $DRIVE01
$ df -h .
Filesystem Size Used Avail Use% Mounted on
acrypt/no-backup/annexdrive01 500M 1.0M 499M 1% /acrypt/no-backup/annexdrive01
$ git clone $METAREPO
Cloning into 'testdata'...
done.
$ cd $REPONAME
$ git config annex.thin true
$ git annex init "test drive #1"
$ git annex adjust --hide-missing --unlock
adjust
Switched to branch 'adjusted/main(hidemissing-unlocked)'
ok
$ git annex sync
OK, that's the initial setup. Now let's enable the source remote and configure it the same way we did before:
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config remote.source.annex-tracking-branch main:$REPONAME
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
Now, we'll add the drive to a group called "driveset01" and configure what we want on it:
$ git annex group . driveset01
$ git annex wanted . '(not copies=driveset01:1)'
What this does is say: first of all, this drive is in a group named driveset01. Then, this drive wants any files for which there isn't already at least one copy in driveset01. Now let's load up some files!
$ git annex sync --content
As the messages fly by from here, you'll see it mentioning that it got mtree, and then various files from "source" -- until, that is, the filesystem had less than 100MB free, at which point it complained of no space for the rest. Exactly like we wanted! Now, we need to teach $METAREPO about $DRIVE01.
$ cd $METAREPO
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync drive01
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
On branch adjusted/main(unlockpresent)
nothing to commit, working tree clean
ok
merge synced/main (Merging into main...)
Updating d1d9e53..817befc
Fast-forward
(Merging into adjusted branch...)
Updating 7ccc20b..861aa60
Fast-forward
ok
pull drive01
remote: Enumerating objects: 214, done.
remote: Counting objects: 100% (214/214), done.
remote: Compressing objects: 100% (95/95), done.
remote: Total 110 (delta 6), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (110/110), 13.01 KiB 1.44 MiB/s, done.
Resolving deltas: 100% (6/6), completed with 6 local objects.
From /acrypt/no-backup/annexdrive01/testdata
* [new branch] adjusted/main(hidemissing-unlocked) -> drive01/adjusted/main(hidemissing-unlocked)
* [new branch] adjusted/main(unlockpresent) -> drive01/adjusted/main(unlockpresent)
* [new branch] git-annex -> drive01/git-annex
* [new branch] main -> drive01/main
* [new branch] synced/main -> drive01/synced/main
ok
OK! This step is important, because drive01 and drive02 (which we'll set up shortly) won't necessarily be able to reach each other directly, due to not being plugged in simultaneously. Our $METAREPO, however, will know all about where every file is, so that the "wanted" settings can be correctly resolved. Let's see what things look like now:
$ git annex whereis less
whereis mtree (2 copies)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
ok
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
If I scroll down a bit, I'll see the files past the 400MB mark that didn't make it onto drive01. Let's add another example drive! Walkthrough: Adding a second drive The steps for $DRIVE02 are the same as we did before, just with drive02 instead of drive01, so I'll omit listing it all a second time. Now look at this excerpt from whereis:
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
whereis testdata/[redacted] (1 copy)
c4540343-e3b5-4148-af46-3f612adda506 -- test drive #2 [drive02]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
Look at that! Some files on drive01, some on drive02, some neither place. Perfect! Walkthrough: Updates So I've made some changes in the source directory: moved a file, added another, and deleted one. All of these were copied to drive01 above. How do we handle this? First, we update the metadata repo:
$ cd $METAREPO
$ git annex sync
$ git annex dropunused all
OK, this has scanned $SOURCEDIR and noted changes. Let's see what whereis says:
$ git annex whereis less
...
whereis testdata/cp (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
whereis testdata/file01-unchanged (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
So this looks right. The file I added was a copy of /bin/cp. I moved another file to one named file01-unchanged. Notice that it realized this was a rename and that the data still exists on drive01. Well, let's update drive01.
$ cd $DRIVE01/$REPONAME
$ git annex sync --content
Looking at the testdata/ directory now, I see that file01-unchanged has been renamed, the deleted file is gone, but cp isn't yet here -- probably due to space issues; as it's new, it's undefined whether it or some other file would fill up free space. Let's work along a few more commands.
$ git annex get --auto
$ git annex drop --auto
$ git annex dropunused all
And now, let's make sure metarepo is updated with its state.
$ cd $METAREPO
$ git annex sync
We could do the same for drive02. This is how we would proceed with every update. Walkthrough: Restoration Now, we have bare files at reasonable locations in drive01 and drive02. But, to generate a consistent restore, we need to be able to actually do an export. Otherwise, we may have files with old names, duplicate files, etc. Let's assume that we lost our source and metadata repos and have to restore from scratch. We'll make a new $RESTOREDIR. We'll begin with drive01 since we used it most recently.
$ mv $METAREPO $METAREPO.disabled
$ mv $SOURCEDIR $SOURCEDIR.disabled
$ git clone $DRIVE01/$REPONAME $RESTOREDIR
$ cd $RESTOREDIR
$ git config annex.thin true
$ git annex init "restore"
$ git annex adjust --hide-missing --unlock
Now, we need to connect the drive01 and pull the files from it.
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync --content
Now, repeat with drive02:
$ git remote add drive02 $DRIVE02/$REPONAME
$ git annex sync --content
Now we've got all our content back! Here's what whereis looks like:
whereis testdata/file01-unchanged (3 copies)
3d663d0f-1a69-4943-8eb1-f4fe22dc4349 -- restore [here]
9e48387e-b096-400a-8555-a3caf5b70a64 -- source
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [origin]
ok
...
I was a little surprised that drive01 didn't seem to know what was on drive02. Perhaps that could have been remedied by adding more remotes there? I'm not entirely sure; I'd thought would have been able to do that automatically. Conclusions I think I have demonstrated two things: First, git-annex is indeed an extremely powerful tool. I have only scratched the surface here. The location tracking is a neat feature, and being able to just access the data as plain files if all else fails is nice for future users. Secondly, it is also a complex tool and difficult to get right for this purpose (I think much easier for some other purposes). For someone that doesn't live and breathe git-annex, it can be hard to get right. In fact, I'm not entirely sure I got it right here. Why didn't drive02 know what files were on drive01 and vice-versa? I don't know, and that reflects some kind of misunderstanding on my part about how metadata is synced; perhaps more care needs to be taken in restore, or done in a different order, than I proposed. I initially tried to do a restore by using git annex export to a directory special remote with exporttree=yes, but I couldn't ever get it to actually do anything, and I don't know why. These two cut against each other. On the one hand, the raw accessibility of the data to someone with no computer skills is unmatched. On the other hand, I'm not certain I have the skill to always prepare the discs properly, or to do a proper consistent restore.

15 June 2023

Jonathan Dowland: containers as first-class network citizens

I've moved to having containers be first-class citizens on my home network, so any local machine (laptop, phone,tablet) can communicate directly with them all, but they're not (by default) exposed to the wider Internet. Here's why, and how. After I moved containers from docker to Podman and systemd, it became much more convenient to run web apps on my home server, but the default approach to networking (each container gets an address on a private network between the host server and containers) meant tedious work (maintaining and reconfiguring a HTTP reverse proxy) to make them reachable by other devices. A more attractive arrangement would be if each container received an IP from the range used by my home LAN, and were automatically addressable from any device on it. To make the containers first-class citizens on my home LAN, first I needed to configure a Linux network bridge and attach the host machine's interface to it (I've done that many times before); then define a new Podman network, of type "bridge". podman-network-create (1) serves as reference, but the blog post Exposing Podman containers fully on the network is an easier read (skip past the macvlan bit). I've opted to choose IP addresses for each container by hand. The Podman network is narrowly defined to a range of IPs that are within the subnet that my ISP-provided router uses, but outside the range of IPs that it allocates. When I start up a container by hand for the first time, I choose a free IP from the sub-range by hand and add a line to /etc/avahi/hosts on the parent machine, e.g.
192.168.1.33 octoprint.local
I then start the container specifying that address, e.g.
podman run --rm -d --name octoprint \
        ...
        --network bridge_local --ip 192.168.1.33 \
        octoprint/octoprint
I can now access that container from any device in my house (laptop, phone, tablet...) via octoprint.local. What's next Although it's not a huge burden, it would be nice to not need to statically define the addresses in /etc/avahi/hosts (perhaps via "IPAM"). I've also been looking at WireGuard (which should be the subject of a future blog post) and combining this with that would be worthwhile.

14 June 2023

Freexian Collaborators: Monthly report about Debian Long Term Support, May 2023 (by Roberto C. S nchez)

Like each month, have a look at the work funded by Freexian s Debian LTS offering.

Debian LTS contributors In May, 18 contributors have been paid to work on Debian LTS, their reports are available:
  • Abhijith PA did 6.0h (out of 6.0h assigned and 8.0h from previous period), thus carrying over 8.0h to the next month.
  • Anton Gladky did 6.0h (out of 8.0h assigned and 7.0h from previous period), thus carrying over 9.0h to the next month.
  • Bastien Roucari s did 17.0h (out of 17.0h assigned and 3.0h from previous period), thus carrying over 3.0h to the next month.
  • Ben Hutchings did 17.0h (out of 16.0h assigned and 8.0h from previous period), thus carrying over 7.0h to the next month.
  • Chris Lamb did 18.0h (out of 18.0h assigned).
  • Daniel Leidert did 0.0h (out of 0h assigned and 12.0h from previous period), thus carrying over 12.0h to the next month.
  • Dominik George did 0.0h (out of 0h assigned and 20.34h from previous period), thus carrying over 20.34h to the next month.
  • Emilio Pozuelo Monfort did 32.0h (out of 18.5h assigned and 16.0h from previous period), thus carrying over 2.5h to the next month.
  • Guilhem Moulin did 20.0h (out of 8.5h assigned and 11.5h from previous period).
  • Holger Levsen did 0.0h (out of 0h assigned and 10.0h from previous period), thus carrying over 10.0h to the next month.
  • Lee Garrett did 0.0h (out of 0h assigned and 40.5h from previous period), thus carrying over 40.5h to the next month.
  • Markus Koschany did 34.5h (out of 34.5h assigned).
  • Roberto C. S nchez did 18.25h (out of 20.5h assigned and 11.5h from previous period), thus carrying over 13.75h to the next month.
  • Scarlett Moore did 20.0h (out of 20.0h assigned).
  • Sylvain Beucler did 34.5h (out of 29.0h assigned and 5.5h from previous period).
  • Thorsten Alteholz did 14.0h (out of 14.0h assigned).
  • Tobias Frost did 16.0h (out of 15.0h assigned and 1.0h from previous period).
  • Utkarsh Gupta did 5.5h (out of 5.0h assigned and 26.0h from previous period), thus carrying over 25.5h to the next month.

Evolution of the situation In May, we have released 34 DLAs. Several of the DLAs constituted notable security updates to LTS during the month of May. Of particular note were the linux (4.19) and linux-5.10 packages, both of which addressed a considerable number of CVEs. Additionally, the postgresql-11 package was updated by synchronizing it with the 11.20 release from upstream. Notable non-security updates were made to the distro-info-data database and the timezone database. The distro-info-data package was updated with the final expected release date of Debian 12, made aware of Debian 14 and Ubuntu 23.10, and was updated with the latest EOL dates for Ubuntu releases. The tzdata and libdatetime-timezone-perl packages were updated with the 2023c timezone database. The changes in these packages ensure that in addition to the latest security updates LTS users also have the latest information concerning Debian and Ubuntu support windows, as well as the latest timezone data for accurate worldwide timekeeping. LTS contributor Anton implemented an improvement to the Debian Security Tracker Unfixed vulnerabilities in unstable without a filed bug view, allowing for more effective management of CVEs which do not yet have a corresponding bug entry in the Debian BTS. LTS contributor Sylvain concluded an audit of obsolete packages still supported in LTS to ensure that new CVEs are properly associated. In this case, a package being obsolete means that it is no longer associated with a Debian release for which the Debian Security Team has direct responsibility. When this occurs, it is the responsibility of the LTS team to ensure that incoming CVEs are properly associated to packages which exist only in LTS. Finally, LTS contributors also contributed several updates to packages in unstable/testing/stable to fix CVEs. This helps package maintainers, addresses CVEs in current and future Debian releases, and ensures that the CVEs do not remain open for an extended period of time only for the LTS team to be required to deal with them much later in the future.

Thanks to our sponsors Sponsors that joined recently are in bold.

Next.

Previous.